GMM based clustering and speaker separability in the Timit speech database

نویسندگان

  • Andrew Morris
  • Dalei Wu
  • Jacques Koreman
چکیده

Speaker recognition on the 630 speaker Timit speech database, using maximum probability selection with a simple Gaussian Mixture Model (GMM) for the data distribution for each speaker, gives above 99% correct recognition. In contrast, a powerful classifier such as a Multi Layer Perceptron (MLP), trained to estimate speaker probabilities, even on a small subset of speakers often performs no better than random selection. We hypothesise two effects which could combine to produce this situation. MLPs do badly because the acoustic feature data is primarily clustered around phonemes, so that speaker classes are highly fragmented and interspersed. In contrast, GMMs model speaker data distributions well because variation within the phonetic cluster identified by each Gaussian is primarily due to speaker variation, with the result that when speaker models are trained by adapting only the means from a multi speaker world model, the resulting GMMs are highly discriminative between speakers. In this article we analyse the distribution of speech and speaker information, both overall and within the cluster identified by each Gaussian in a GMM tuned for speaker recognition on Timit. We show that the results of this analysis support the above hypotheses, and then discuss ways in which the enhanced speaker separability within each Gaussian cluster could be used to harness the discriminative power of MLPs to provide feature data enhancement and improved speaker identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-independent speaker identification using vocal tract length normalization for building universal background model

In this paper, we propose to use Vocal Tract Length Normalization (VTLN) to build the Universal Background Model (UBM) for a closed set speaker identification system. Vocal Tract Length (VTL) differences among speakers is a major source of variability in the speech signal. Since the UBM model is trained using data from many speakers, it statistically captures this inherent variation in the spee...

متن کامل

Speaker Recognition and Broad Phonetic Groups

The aim of this study is to provide a quantitative assessment of the speaker discriminating properties of broad phonetic groups. GMM based approach to speaker modelling is used in conjunction with a phonetically handlabelled speech database (TIMIT) to produce broad phonetic group ranking based on speaker identification scores. The broad phonetic groups nasals and vowels were found to be particu...

متن کامل

Performance Analysis of Speaker Identification System Using GMM with VQ

Personal identity identification is an important requirement for controlling access to protected resources. Biometric identification by using certain features of a person is a more secured solution for security identification. Advances in speech processing technology and digital signal processors have made possible the design of high-performance and practical speaker recognition systems. A more...

متن کامل

Performance Evaluation of Statistical Approaches for Text Independent Speaker Recognition Using Source Feature

This paper introduces the performance evaluation of statistical approaches for Text-Independent speaker recognition system using source feature. Linear prediction (LP) residual is used as a representation of excitation information in speech. The speaker-specific information in the excitation of voiced speech is captured using statistical approaches such as Gaussian Mixture Models (GMMs) and Hid...

متن کامل

Training GMMs for Speaker Verification

An established approach to training Gaussian Mixture Models (GMMs) for speaker verification is via the expectation-maximisation (EM) algorithm. The EM algorithm has been shown to be sensitive to initialisation and prone to converging on local maxima. In exploration of these issues, three different initialisation methods are implemented, along with a split and merge technique to ‘pull’ the train...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004